-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: add return_inverse to pd.unique #24119
Conversation
Hello @h-vetinari! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-10-11 14:19:02 UTC |
Failure is (thankfully) only a flaky hypothesis test. |
d19f073
to
006d7ad
Compare
Codecov Report
@@ Coverage Diff @@
## master #24119 +/- ##
==========================================
- Coverage 92.37% 92.2% -0.18%
==========================================
Files 166 162 -4
Lines 52420 51720 -700
==========================================
- Hits 48423 47688 -735
- Misses 3997 4032 +35
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24119 +/- ##
=========================================
- Coverage 93.07% 92.37% -0.7%
=========================================
Files 192 166 -26
Lines 49551 52439 +2888
=========================================
+ Hits 46119 48441 +2322
- Misses 3432 3998 +566
Continue to review full report at Codecov.
|
Thinks changed around a bit here with As mentioned in the OP, this splits off first chunk of #24108 and makes some progress towards #4087 / #21357 / #21720 / #22824. I'm sure there'll still be lots of discussion, but having an implementation is a good start (even though there's not much happening - the cython backend is already there since a few months). The diff in |
@jreback @jorisvandenbossche @TomAugspurger |
I won't have time in the near-term.
…On Sun, Mar 10, 2019 at 7:55 AM h-vetinari ***@***.***> wrote:
@jreback <https://github.com/jreback> @jorisvandenbossche
<https://github.com/jorisvandenbossche> @TomAugspurger
<https://github.com/TomAugspurger>
Is it possible to give this PR some initial review? It's been lying around
for ~3 months...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24119 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIpQHnXJQ1HqQj0zox1oTTiECzGoYks5vVQDZgaJpZM4ZFeYy>
.
|
so again why should we add a method to do this, when we already have one?
true numpy calls this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments
closing as stale |
@jreback, this is not stale, but I do see that I overlooked to answer your comment a bit further up. I've made the case in #22824 several times, but among other things, Then there's the fact that most people who are not knee-deep in stats or R won't know Please reopen this and lets have this discussion (bearing in mind that the actual goal would be #24108; this PR is just a stepping stone). |
Thanks for reopening! Will try to merge master soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments on the diff
@@ -94,6 +94,25 @@ of the Series or columns of a DataFrame will also have string dtype. | |||
We recommend explicitly using the ``string`` data type when working with strings. | |||
See :ref:`text.types` for more. | |||
|
|||
|
|||
.. _whatsnew_1000.enhancements.unique: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -355,19 +371,23 @@ def test_factorize_na_sentinel(self, sort, na_sentinel, data, uniques): | |||
else: | |||
tm.assert_extension_array_equal(uniques, expected_uniques) | |||
|
|||
|
|||
class TestUnique: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that no tests are removed here (even though the diff is large). I sometimes joined tests and parametrized them with fixtures. In fact, there should be many more tests now...
assert_series_or_index_or_array_or_categorical_equal(result, expected) | ||
|
||
# TODO: add support for return_inverse to DatetimeArray/DatetimeIndex, | ||
# as well as [[Series/Index].unique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this here as an indication where things should be heading.
@TomAugspurger @jorisvandenbossche @jreback
|
I am highly u likely to change my -1 on this i view this as duplicative and confusing api if you want to document in an example great |
You had also already agreed to the utility of having an inverse here (at the time you suggested to add the inverse to And as I won't tire of repeating: I feel the conversation keeps going in circles - maybe this could be a nice example case for writing a fully fledged enhancement proposal? |
closing. I don't think anyone has the bandwith to work with you on this. |
git diff upstream/master -u -- "*.py" | flake8 --diff
This is the first part I'm splitting off of #24108, but now with full test coverage. For the moment, I've added
return_inverse
topd.unique
and toCategorical.unique
, but it's not trivial because of inconsistencies like the following:I'd be open to further split off the change for
Categorical.unique
, or just returnNotImplemented
for allExtensionArray
types. As mentioned in #24108 already, I believe that the possibility forreturn_inverse
(or maybe even kwargs in general??) is something that should be added to the EA interface. @TomAugspurger @jreback @jbrockmendel